NCHLT: isiNdebele POS tag set
For purposes of annotators, this tag set is by and large taken over from
Taljard et al. (2008) and various documents compiled
by G. Faasz and U. Heid from the IMS, Stuttgart and D.J.
Prinsloo and E. Taljard, University of Pretoria. The
information below refers to the current state of the tagset,
but further development will probably necessitate any number of changes.
The tagset is mainly based on the lexical and
morphological criteria defined by Lombard (1985) and Louwrens (1991). The
logical structure of the tagset is divided into two
layers of linguistic description (annotation levels):
The first annotation level (level 1) includes all mandatory, or,
according to EAGLES, obligatory information, namely up to three elements: an
element hinting at the word class, a second one specifying functional or
syntactic properties, and a third one giving morphological specifics, cf. e.g. PRO(noun)EMP(hatic)PERS(on).
The second level of annotation (level 2) includes recommended and
optional information. This level is in most cases used for a detailed
description of closed class items described in the tagger lexicon. Compare the
following excerpt:
Figure 1: Annotation levels
Description |
Tag 1st level
(mandatory information) |
Tag 2nd level
(optional/ recommended information) |
Pronouns: |
|
|
emphatic personal |
PROEMPPERS |
1sg,2sg,1pl,2pl |
Verbals: |
V |
tr |
Morphemes: |
|
|
deficient |
MORPH |
def |
For disjunctive languages, next to all orthographic words, all
linguistic words will also be tagged, resulting in two layers of POS
annotation: one for all orthographic words and one for all linguistic words.
For conjunctive languages, this extra layer of POS annotation is not needed.
The tagset currently distinguishes 20
categories applicable to isiNdebele and two different levels of annotation.
However, only level 1 has been annotated. The first part of the tag gives a
general indication of the nature of the unit in question. These are as follows:
Tag |
Explanation |
PUNC |
Punctuation |
ABBR |
Abbreviation
(incl. acronyms) |
ADJ |
Adjective
(incl. enumerative) |
ADV |
Adverb |
CDEM |
Class-indicating
demonstrative |
CONJ |
Conjunction
|
COP |
Copulative
(copulative subject concord, demonstrative copulative, copulative verb) |
FOR |
Foreign |
IDEO |
Ideophone |
INT |
Interjection |
INTER |
Question
word |
N |
Noun |
NPP |
Place
and brand name |
NUM |
Numerative |
POSS |
Possessive
(possessive concord, possessive pronoun) |
PROEMP |
Emphatic
pronoun |
PROQUANT |
Quantitative
pronoun |
REL |
Relative |
V |
Verbal |
VAUX |
Auxiliary
verb |
|
|
|
|
Tags not
applicable to IsiNdebele |
|
ASP |
Aspectual marker |
AUX |
Auxiliary stem |
CN |
Class-indicating nominal prefix |
CO |
Class-indicating object concord |
CS |
Class-indicating subject concord |
MNEG |
Negative morpheme |
PART |
Particle |
TENS |
Tense marker |
Level 1: PUNC
; |
PUNC |
( |
PUNC |
! |
PUNC |
“ |
PUNC |
Level 1: ABBR
isib |
ABBR |
NGO |
ABBR |
Level 1:
ADJ01-11, ADJ 14-15, ADJ01a, ADJ02a, ADJLOC
omunye |
ADJ01 |
elikhulu |
ADJ05 |
komunye |
ADJLOC |
Level 1: ADV, ADVLOC
kanye |
ADV |
phambili |
ADV |
engaphasi |
ADVLOC |
Level 1:
CDEM01-11, CDEM14-15, CDEMLOC
labo |
CDEM02 |
loyo |
CDEM03 |
lapho |
CDEMLOC |
Level 1: CONJ
namkha |
CONJ |
ukuba |
CONJ |
Level 1: COP
Level 2: COP_neg, COP_nil
(-be, -bê and
–bilê).
For the copulative verb stem –se the tag COP_neg
on level 2 is used, as is the case for the verb stem –be (<-ba)
when it is used in the negative form.
yiKomidi |
COP |
kube |
COP |
Level 1: FOR
provincial |
FOR |
systems |
FOR |
Level 1: IDEO
godu |
IDEO |
yeke |
IDEO |
Level 1: INT
Level 2: INT_neg, INT_nil
na |
INT |
nekomo |
INT |
Level 1: INTER
Level 2: _man,
_time, _loc, _N01a, _N02a
na |
INTER |
bunjani |
INTER |
mangaki |
INTER |
Level 1: N01-11, N14-15, N01a, N02a, NLOC, N00
Level 2: _aug, _dim, _loc, _name, _nil
umuntu |
N01 |
abomma |
N02a |
imiphumela |
N04 |
iphrojekthi |
N05 |
amalunga |
N06 |
isiqhema |
N07 |
indawo |
N09 |
mayelana |
N00 |
emsebenzini |
NLOC |
Level 1: NPP
Level 2: NPP_place, NPP_brand
KwaZulu-Natal |
NPP |
Mars |
NPP |
Level 1: NUM
2.2 |
NUM |
2005 |
NUM |
74(a) |
NUM |
Level 1:
POSS01-11, POSS14-15, POSSLOC, POSSPERS,
POSSKA
Level 2: POSSPERS_1pl,
POSSPERS_2pl
wephrojekhti |
POSS01 |
sokutlama |
POSS07 |
kamasipala |
POSSKA |
Level 1: PROEMP01-11,
PROEMP14-15, PROEMPLOC, PROEMPPERS
Level 2:
PROEMPPERS_1sg, PROEMPPERS_1pl, PROEMPPERS_2sg, PROEMPPERS_2pl
bona |
PROEMP02 |
kizo |
PROEMPLOC |
khona |
PROEMP15 |
Level 1:
PROQUANT01-11, PROQUANT14-15, PROQUANTLOC
boke |
PROQUANT02 |
zoke |
PROQUANT10 |
koke |
PROQUANT15 |
Level 1: REL
angeze |
REL |
elibanzi |
REL |
esingaba |
REL |
Level 1: V
Level 2: V_tr, V_itr, V_dtr
babe |
V |
ukwakha |
V |
inikelwe |
V |
Level 1: VAUX
Level 2: VAUX_tr, VAUX-itr, VAUX_dtr
ibe |
VAUX |
ukungabi |
VAUX |